Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up project discovery #242

Closed
wants to merge 4 commits into from
Closed

Speed up project discovery #242

wants to merge 4 commits into from

Conversation

oehme
Copy link
Contributor

@oehme oehme commented Apr 9, 2019

I profiled the project discovery phase of a large (2000 module) Maven build today, because it was taking several minutes before it actually started executing goals. I found some low-level hotspots that this PR fixes. It shaves off about 2.5min of the startup time of this build. See the individual commit messages for more details. There are still other hotspots left, but they are in the plexus-interpolation project, for which I'll open a separate PR.

  • Make sure there is a JIRA issue filed
    for the change (usually before you start working on it). Trivial changes like typos do not
    require a JIRA issue. Your pull request should address just this issue, without
    pulling in other changes.
  • Each commit in the pull request should have a meaningful subject line and body.
  • Format the pull request title like [MNG-XXX] - Fixes bug in ApproximateQuantiles,
    where you replace MNG-XXX with the appropriate JIRA issue. Best practice
    is to use the JIRA issue title in the pull request title and in the first line of the
    commit message.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Run mvn clean verify to make sure basic checks pass. A more thorough check will
    be performed on your pull request automatically.
  • You have run the Core IT successfully.

If your pull request is about ~20 lines of code you don't need to sign an
Individual Contributor License Agreement if you are unsure
please ask on the developers list.

To make clear that you license your contribution under
the Apache License Version 2.0, January 2004
you have to acknowledge this by using the following check-box.

@michael-o
Copy link
Member

michael-o commented Apr 9, 2019

I will have a look at this tonight.

@oehme
Copy link
Contributor Author

oehme commented Apr 10, 2019

Note that 3 core-it tests were failing for me, but they also fail with Maven 3.6.0, so they are probably environment dependent. Are there any specific requirements for these tests? I'd really like to see a fully green build :)

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very awesome.
Do you have any number to share ?

@oehme
Copy link
Contributor Author

oehme commented Apr 10, 2019

Project discovery on the project took ~5min before, now down to ~2.5min. Still a long way to go, but a good improvement :)

@oehme
Copy link
Contributor Author

oehme commented Apr 10, 2019

The last commit seems to be problematic, some core-its around ArtifactHandler are failing. I'm not yet sure I understand why - Can the artifact handler for a given type change over time? I thought the idea is to only add additional handlers for new types?

Nevermind, I got it - We can't remember the DefaultArtifactHandler created for unknown types, as that might later be replaced with a better handler by an extension.

All integration tests are passing now.

Copy link
Member

@michael-o michael-o left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create seperate JIRA tickets for each change. Did you also try change by change and saw how it affects the build time? I'd like to see broken down numbers for your case.

if ( tok.hasMoreTokens() )
{
qualifier = tok.nextToken();
fallback = Pattern.compile( "\\d+" ).matcher( qualifier ).matches();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the char array approach as before?

Copy link
Contributor Author

@oehme oehme Apr 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it wasn't a hotspot in my tests. I can do it for consistency, but I can't prove that it is a problem :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not, if you know that this works better that with regex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally only change things if I can measure they are a problem. Otherwise one can quickly get into a situation where things get worse or the code gets less readable without a benefit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've adjusted it, as in this case it also improves readability.

return null;
}
long longValue = Long.parseLong( s );
if ( longValue > Integer.MAX_VALUE )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, change in semantics with long. Dislike on this one. Why all the hassle, why not use NumberUtils from Commons Lang 3?

Copy link
Contributor Author

@oehme oehme Apr 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it throws an exception internally, which is what we're trying to avoid in the first place here. We don't want to throw exceptions on common code paths.

Also, we're not changing semantics here, we are still returning an int. Going through long is only done for the very common case of big numbers that would fail with a NumberFormatException on Integer.parseInt

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to @oehme, however it is probably worth adding a comment that explains this (the reason behind tryParseInt method and the reason behind parseLong).

@Override
public int hashCode()
{
return value;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use Objects#hashCode() on this?

@@ -211,6 +235,30 @@ public int compareTo( Item item )
}
}

@Override
public boolean equals( Object o )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use Objects#equals() on this?

}

@Override
public int hashCode()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use Objects#hashCode() on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What for? It's a single object, not a list.

@@ -272,6 +320,29 @@ public int compareTo( Item item )
}

@Override
public boolean equals( Object o )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here...

@@ -384,6 +455,29 @@ public int compareTo( Item item )
}

@Override
public boolean equals( Object o )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here...

@oehme
Copy link
Contributor Author

oehme commented Apr 10, 2019

Please create seperate JIRA tickets for each change.

Why is this overhead necessary? I wouldn't know what to write in the ticket besides "look at the commit message" ;)

Did you also try change by change and saw how it affects the build time? I'd like to see broken down numbers for your case.

I did, each change removes a similar chunk of the build time. They are all very much worth it. I can send you the original and after profiler snapshot if you'd like.

@michael-o
Copy link
Member

Please create seperate JIRA tickets for each change.

Why is this overhead necessary? I wouldn't know what to write in the ticket besides "look at the commit message" ;)

This isn't overhead. Every change needs to be documented as a ticket. They go into the release notes. We don't expect users to read the log to understand what has been changed.
It is enough to write a sentence in the ticket which justifies the change.

@oehme oehme force-pushed the master branch 3 times, most recently from 186586c to f4b6e48 Compare April 11, 2019 08:56
Use a simple list of allowed characters instead of a regex.
@oehme oehme force-pushed the master branch 2 times, most recently from dca4f99 to 2b34a4e Compare April 11, 2019 09:14
oehme added 3 commits April 11, 2019 11:15
By not allocating the canonical representation for equals/hashcode,
but instead using the items we already have. This saves both time
and memory.

I left the canonical field around for testing purposes.
Use if-statements instead of exception-based control flow.
Throwing exceptions is very expensive and should not be used
for normal flow.
Otherwise we have to go through the whole sisu engine again,
which is very slow, because it does a linear scan.
@oehme
Copy link
Contributor Author

oehme commented Apr 11, 2019

I've created the issues and addressed your comments above, please have another look.

@oehme
Copy link
Contributor Author

oehme commented Apr 11, 2019

Here are two profiles, with the optimized methods colored in pink and their total contribution to CPU time shown at the bottom:

Before:
Screenshot 2019-04-11 at 12 02 31

After:
Screenshot 2019-04-11 at 12 02 54

As you can see, optimizing StringSearchModelInterpolator will be a great next step.

@hboutemy
Copy link
Member

Note that 3 core-it tests were failing for me, but they also fail with Maven 3.6.0, so they are probably environment dependent. Are there any specific requirements for these tests? I'd really like to see a fully green build :)

if you follow the instructions https://maven.apache.org/core-its/core-it-suite/, it should work easily on any configuration (as it works on ASF Jenkins and on many Maven developers). But there may be weak ITs that are fragile against something in your environment: what are the failing ITs and your configuration, please?

@hboutemy
Copy link
Member

another side: I'd like to be able to reproduce your tests and profiling.
Would it be feasible to create a sample build and write some quick explanations on doing such profiling, please?

@oehme
Copy link
Contributor Author

oehme commented Apr 21, 2019

@rfscholte, @britter and I are meeting about this next week. How about you join in as well and I can walk you through some of the methodology?

@hboutemy
Copy link
Member

I had to re-launch the build on ASF Jenkins for MNG-6632 2 times before it went ok https://builds.apache.org/view/M-R/view/Maven/job/maven-box/job/maven/job/MNG-6632/
and I had to relaunch it one time for MNG-6630 https://builds.apache.org/view/M-R/view/Maven/job/maven-box/job/maven/job/MNG-6630/
looks like it was a temporary issue on the servers...

I reviewed also the commits: everything looks perfect to me

I'll approve these changes on GitHub and finish the approval on dev list: IMHO, we'll need to find a simpler process for the future...

@hboutemy hboutemy requested review from michael-o and hboutemy April 23, 2019 17:11
@rfscholte
Copy link
Contributor

We'll probably need to fix https://issues.apache.org/jira/browse/MNG-6643 afterwards, i.e. ComparableVersion should not depend on any third party libraries so it can easily be executed from commandline.

@khmarbaise
Copy link
Member

@rfscholte, @britter and I are meeting about this next week. How about you join in as well and I can walk you through some of the methodology?

Can you tell more accurate when you like to meet?

@oehme
Copy link
Contributor Author

oehme commented Apr 24, 2019

@khmarbaise the meeting was yesterday afternoon - @britter will post a summary on the mailing list soon.

@britter
Copy link
Member

britter commented Apr 24, 2019

@khmarbaise the summary is here: https://lists.apache.org/thread.html/240bec41fd91580027661a167dbde7aa001ff03aa0b20ef779f927ce@%3Cdev.maven.apache.org%3E

If we do this kind of calls again I will make sure more people get a chance to join. It was kind of improvised over the easter weekend. Sorry!

@oehme
Copy link
Contributor Author

oehme commented Apr 24, 2019

@hboutemy Only one integration test still fails locally for me (an master too), the mng6386BaseUriProperty(itMNG6386UnicodeChars) case. I'll see if I can find out why, it's not a big deal for me though.

Thanks for approving the changes. I was wondering one thing: Is using Guava allowed in the core module? It's already a transitive dependency of Maven and it could simplify some of the code I wrote for this PR.

Copy link
Contributor

@belingueres belingueres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR, but in the following private methods: validate30RawProfileActivation(), validate20RawDependenciesSelfReferencing() and validateEffectiveModelAgainstDependency(), the "request" parameters are never used, so they could be safely deleted.

@hboutemy
Copy link
Member

PR merged, and additional commit for explanation added

thanks a lot

@hboutemy hboutemy closed this Apr 27, 2019
@oehme
Copy link
Contributor Author

oehme commented Apr 27, 2019

Thanks @hboutemy! Can you comment on my earlier question? It would help me make my future contributions as concise as possible:

Is using Guava allowed in the core module? It's already a transitive dependency of Maven and it could simplify some of the code I wrote for this PR.

@rfscholte
Copy link
Contributor

I prefer not to use guava, we've seen way too many backwords compatibility issues with this library. IIRC it is used by Sisu, I hope that's the only one.

}
}

private boolean isValidId( String id )
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be static? (in order to prevent accidental access to instance fields)

gnodet pushed a commit to gnodet/maven that referenced this pull request Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants